Work Environment Survey (WES)
conducted by BC Stats for employees within BC Public Service
measures the health of work environments and identifies areas for improvement
~80 multiple choice questions (5 point scale) and 2 open-ended questions
2020-06-19
Work Environment Survey (WES)
conducted by BC Stats for employees within BC Public Service
measures the health of work environments and identifies areas for improvement
~80 multiple choice questions (5 point scale) and 2 open-ended questions
Question 1
Example: "Better health and social benefits should be provided."
Question 2
Example: "Now we have more efficient vending machines."
*Note: these examples are fake comments for privacy reasons.
What one thing would you like your organization to focus on to improve your work environment?
| Comments* | CPD | CB | EWC | … | CB_Improve_benefits | CB_Increase_salary |
|---|---|---|---|---|---|---|
| Better health and social benefits should be provided | 0 | 1 | 0 | … | 1 | 0 |
Theme: CB = Compensation and Benefits
Subtheme: CB_Improve_benefits = Improve benefits
Question 1:
Question 2:
*Note: this is a fake comment as an example of the data.
# 1) Build a model to automate multi-label text classification that:
predicts label(s) for Question 1 and 2's main themes
predicts label(s) for Question 1's sub-themes
# 2) Build an app for visualizations on text data:
identify and compare common words used for each question
identify trends on concerns (Q1) and appreciations (Q2) for BC ministries over the given years
There are 12 themes and 63 subthemes that comments can be encoded into.
Example comment to get flagged: "George and I love when the department gives us new coupons!"
TF-IDF Vectorizer uses weights instead of token counts (CountVectorizer).
Classifier Chains is a multi-label classification method which preserves order and occurence of labels.
explored several embeddings on various models
built embedding matrix & transformed comments to padded sequenced data to fit into embedding size
used these saved embeddings on public cloud services as data contains sensitive information
Precision Recall curve: plotting precision vs recall at various threshold rates
Source: Precision and Recall
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| TF-IDF + LinearSVC | 0.50 | 0.79 | 0.63 | 0.70 |
| Fasttext + BiGRU | 0.53 | 0.75 | 0.71 | 0.73 |
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| Bag of Words + LinearSVC | 0.45 | 0.74 | 0.64 | 0.69 |
| Ensemble Model | 0.53 | 0.83 | 0.66 | 0.74 |
Source: BC Stats Capstone 2019-Final Report, by A. Quinton, A. Pearson, F. Nie
Themes with high F1 scores (CB) can be encoded
automatically using the model, while themes with low score (OTH) should be
manually verified by BC Stats
Recommendation to use a combination of machine learning and manual encoding.
Subthemes are predicted based on the theme(s) our model has assigned to the comment.
Observed a linear trend in frequency of common words between Question 1 and Question 2.
Validated using the themes from Question 1 to label comments from Question 2.
observe better results with more more data
use embeddings and padded data on public cloud services (Google Gollab, AWS) to apply complex machine learning algorithms on sensitive data
BERT
Topic modelling for Question 2 after removing commonly repeated words